Statistics of pauses appearing in Polish as a potential source of biometry information for automatic speaker recognition\nwere described. The usage of three main types of acoustic pauses (silent, filled and breath pauses) and syntactic pauses\n(punctuation marks in speech transcripts) was investigated quantitatively in three types of spontaneous speech\n(presentations, simultaneous interpretation and radio interviews) and read speech (audio books). Selected parameters of\npauses extracted for each speaker separately or for speaker groups were examined statistically to verify usefulness of\ninformation on pauses for speaker recognition and speaker profile estimation. Quantity and duration of filled pauses,\naudible breaths, and correlation between the temporal structure of speech and the syntax structure of the spoken\nlanguage were the features which characterize speakers most. The experiment of using pauses in speaker biometry\nsystem (using Universal Background Model and i-vectors) resulted in 30 % equal error rate. Including pause-related\nfeatures to the baseline Mel-frequency cepstral coefficient system has not significantly improved its performance. In the\nexperiment with automatic recognition of three types of spontaneous speech, we achieved 78 % accuracy, using GMM\nclassifier. Silent pause-related features allowed distinguishing between read and spontaneous speech by extreme\ngradient boosting with 75 % accuracy.
Loading....